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Abstract 



Functional Magnetic Resonance Imaging (fMRI) provides dynamical access into 
the complex functioning of the human brain, detailing the hemodynamic activ- 
ity of thousands of voxels during hundreds of sequential time points. One ap- 
proach towards illuminating the connection between fMRI and cognitive function 
is through decoding; how do the time series of voxel activities combine to provide 
information about internal and external experience? Here we seek models of fMRI 
decoding which are balanced between the simplicity of their interpretation and the 
effectiveness of their prediction. We use signals from a subject immersed in vir- 
tual reality to compare global and local methods of prediction applying both linear 
and nonlinear techniques of dimensionality reduction. We find that the prediction 
of complex stimuli is remarkably low-dimensional, saturating with less than 100 
features. In particular, we build effective models based on the decorrelated com- 
ponents of cognitive activity in the classically-defined Brodmann areas. For some 
of the stimuli, the top predictive areas were surprisingly transparent, including 
Wernicke's area for verbal instructions, visual cortex for facial and body features, 
and visual-temporal regions for velocity. Direct sensory experience resulted in 
the most robust predictions, with the highest correlation (c ~ 0.8) between the 
predicted and experienced time series of verbal instructions. Techniques based on 
non-linear dimensionality reduction (Laplacian eigenmaps) performed similarly. 
The interpretability and relative simplicity of our approach provides a conceptual 
basis upon which to build more sophisticated techniques for fMRI decoding and 
offers a window into cognitive function during dynamic, natural experience. 



* Advances in Neural Information Processing Systems 20, Scholkopf B., Piatt J. and Hofmann T. (Editors), 
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1 Introduction 



Functional Magnetic Resonance Imaging (fMRI) is a non-invasive imaging technique that can quan- 
tify changes in cerebral venous oxygen concentration. Changes in the flMRI signal that occur during 
brain activation are very small (1-5%) and are often contaminated by noise (created by the imaging 
system hardware or physiological processes). Statistical techniques that handle the stochastic nature 
of the data are commonly used for the detection of activated voxels. Traditional methods of analy- 
sis - which are designed to test the hypothesis that a simple cognitive or sensory stimulus creates 
changes in a specific brain area - are unable to analyze fMRI datasets collected in "natural stimuli" 
where the subjects are bombarded with a multitude of uncontrolled stimuli that cannot always be 
quantified im II. 

The Experience Based Cognition competition (EBC) |3 | offers an opportunity to study complex re- 
sponses to natural environments, and to test new ideas and new methods for the analysis of fMRI 
collected in natural environments. The EBC competition provides fMRI data of three human sub- 
jects in three 20-minute segments (704 scanned samples in each segment) in an urban virtual reality 
environment along with quantitative time series of natural stimuli or features (25 in total) ranging 
from objective features such as the presence of faces to self-reported, subjective cognitive states 
such as the experience of fear. During each session, subjects were audibly instructed to complete 
three search tasks in the environment: looking for weapons (but not tools) taking pictures of people 
with piercings (but not others), or picking up fruits (but not vegetables). The data was collected with 
a 3T EPI scanner and typically consists of the activity of 35000 volume elements (voxels) within 
the head. The feature time series was provided for only the first two segments (1408 time samples) 
and competitive entries are judged on their ability to predict the feature on the third segment (704 
time samples, see Fig. [T]). At a microscopic level, a large number of internal variables associated 



Figure 1: We study the variation of the set of features fk{t)^k = 1, • • • , i^T as a function of the 
dynamical changes in the fMRI signal X(t) = [xi(t), • • • , XAr(t)] during natural experience. The 
features represent both external stimuli such as the presence of faces and internal emotional states 
encountered during the exploration of a virtual urban environment (left and right images). We predict 
the feature functions fk for t = T/+i, • • • T, from the knowledge of the entire fMRI dataset A', and 
the partial knowledge of fk{t) for t = 1, • • • , T^ The "toy" activation patterns (middle diagram) 
illustrate the changes in "brain states" occurring as a function of time. 



with various physical and physiological phenomena contribute to the dynamic changes in the fMRI 
signal. Because the fMRI signal is a large scale (as compared to the scale of neurons) measurement 
of neuronal activity, we expect that many of these variables will be coupled resulting in a low di- 
mensional set for all possible configurations of the activated fMRI signal. In this work we seek a 
low dimensional representation of the entire fMRI dataset that provides a new set of 'voxel-free" 
coordinates to study cognitive and sensory features. 

We denote a three-dimensional volumes of fMRI composed of a total of N voxels by X(t) = 
[xi(t), • • • , XAr(t)]. We have access to T such volumes. We can stack the spatio-temporal fMRI 
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dataset into slN x T matrix, 



a: 



"xi(l) ••• xiiT) 



(1) 



where each row n represents a time series generated from voxel n and each column j represents 
a scan acquired at time tj. We call the set of features to be predicted fk^k = 1, , • • • jK. We are 
interested in studying the variation of the set of features fk{t)^ k = 1, • - ■ , K describing the subject 
experience as a function of the dynamical changes of the brain, as measured by X(t). Formally, we 
need to build predictions of fk {t) for t = T^+i , • • • T, from the knowledge of the entire fMRI dataset 
A', and the partial knowledge of fk{t) for the training time samples t = 1, • • • , (see Fig. [T]). 




Figure 2: Low-dimensional parametrization of the set of "brain states". The parametrization is 
constructed from the samples provided by the fMRI data at different times, and in different states. 



2 A voxel-free parametrization of brain states 

We use here the global information provided by the dynamical evolution of X(t) over time, both 
during the training times and the test times. We would like to effectively replace each fMRI dataset 
X(t) by a small set of features that facilitates the identification of the brain states, and make the 
prediction of the features easier. Formally, our goal is to construct a map (j) from the voxel space to 
low dimensional space. 

^ : P C (2) 

X(t) = [xi{t)r-- ,XN{t)f ^{yi{t)r-- .yL{t)), (3) 

where L <^ N. As t varies over the training and the test sets, we hope that we explore most of 
the possible brain configurations that are useful for predicting the features. The map (j) provides a 
parametrization of the brain states. Figure [2] provides a pictorial rendition of the map (j). The range 
V, represented in Fig. |2]as a smooth surface, is the set of parameters ^i, • • • , that characterize 
the brain dynamics. Different values of the parameters produce different "brain states", associated 
with different patterns of activation. Note that time does not play any role on V, and neighboring 
points on V correspond to similar brain states. Equipped with this re-parametrization of the dataset 
X, the goal is to learn the evolution of the feature time series as a function of the new coordinates 
[^i(t), • • • , ^l(^)]^. Each feature function is an implicit function of the brain state measured by 
[yi{t)j • • • ^yL{t)]- For a given feature fk, the training data provide us with samples of fk at cer- 
tain locations in V. The map (j) is build by globally computing a new parametrization of the set 
{X(l), • • • , X(T)}. This parametrization is built into two stages. First, we construct a graph that is 
a proxy for the entire set of fMRI data {X(l), • • • , X(T)}. Second, we compute some eigenfunc- 
tions (j)k defined on the graph. Each eigenfunctions provides one specific coordinate for each node 
of the graph. 
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2.1 The graph of brain states 

We represent the fMRI dataset for the training times and test times by a graph. Each vertex i 
corresponds to a time sample and we compute the distance between two vertices i and j by 
measuring a distance between X(ti) and X(tj). Global changes in the signal due to residual head 
motion, or global blood flow changes were removed by computing a a principal components analysis 
(PC A) of the dataset X and removing a small number components. We then used the P distance 
between the fMRI volumes (unrolled sls Nxl vectors). This distance compares all the voxels (white 
and gray matter, as well as CSF) inside the brain. 

2.2 Embedding of the dataset 

Once the network of connected brain states is created, we need a distance to distinguish between 
strongly connected states (the two fMRI data are in the same cognitive state) and weakly connected 
states (the fMRI data are similar, but do not correspond to the same brain states). The Euclidean 
distance used to construct the graph is only useful locally: we can use it to compare brain states 
that are very similar, but is unfortunately very sensitive to short-circuits created by the noise in the 
data. A standard alternative to the geodesic (shortest distance) is provided by the average commute 
time, K,{i^j), that quantifies the expected path length between i and j for a random walk started at i. 
Formally, = i) + H{i^j), where H{i,j) is the hitting time, 

H{iJ) = E,[Tj] with Tj=mm{n>0;Zn=j}, 

for a random walk Zn on the graph with transition probability P, defined by Pij = Wij/di, and 
di = Di^i = Wij is the degree of the vertex i. The commute time can be conveniently computed 

from the eigenfunctions , • • • , (I)n of N = D ^ PD~ ^ , with the eigenvalues — 1 < A at • • • < A2 < 
Ai = 1. Indeed, we have 

N / \ ^ 

As proposed in ||4l[5l[6l, we define an embedding 

^^h{^) = y^^. k = 2,---,N (4) 

1 - A/e V^i 

Because — 1 < Aat • • • < A2 < Ai = 1, we have ^ , > ^ , > • • • -7=^. We can therefore 

V-L — A2 V-1- — A3 V-L — Atv 

neglect f.^ for large k, and reduce the dimensionality of the embedding by using only the first 

V -L — Afc 

K coordinates in Q. The spectral gap measures the difference between the first two eigenvalues, 
Ai — A2 = 1 — A2. A large spectral gap indicates that the low dimensional will provide a good 
approximation. The algorithm for the construction of the embedding is summarized in Fig. [3] 

Algorithm 1 : Construction of the embedding 



Input: 

- X(t), t = 1, • • • , T, i^: number of eigenfunctions. 
Algorithm: 

1 . construct the graph defined by the nearest neighbors 

2. find the first K eigenfunctions, of N 
• Output: FoYti = l:T 

- new co-ordinates of X{ti): ykiU) = -j^ -yr^ A: = 2, • • • , + 1 



Figure 3: Construction of the embedding 
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A parameter of the embedding (Fig. [5]) is the number of coordinates. K can be optimized 
by minimizing the prediction error. We expect that for small values of K the embedding will not 
describe the data with enough precision, and the prediction will be inaccurate. If K is too large, some 
of the new coordinates will be describing the noise in the dataset, and the algorithm will overfit the 
training data. Fig. |4]-(a) illustrates the effect of K on the performance of the nonlinear dimension 
reduction. The quality of the prediction for the features: faces, instruction and velocity is plotted 
against K. Instructions elicits a strong response in the auditory cortex that can be decoded with as 
few as 20 coordinates. Faces requires more (about 50) dimensions to become optimal. As expected 
the performance eventually drops when additional coordinates are used to describe variability that 
is not related to the features to be decoded. This confirms our hypothesis that we can replace about 
15,000 voxels with 50 appropriately chosen coordinates. 

2.3 Semi-supervised learning of the features 

The problem of predicting a feature fk at an unknown time tu is formulated as kernel ridge regres- 
sion problem. The training set {fk{t) fox t = 1, • • • , Ti} is used to estimate the optimal choice of 
weights in the following model, 

Ti 

where /C is a kernel and tu is a time point where we need to predict. 

2.4 Results 

We compared the nonlinear embedding approach (referred to as global Laplacian) to dimension 
reduction obtained with a PC A of X. Here the principal components are principal volumes, and for 
each time t we can expand X(t) onto the principal components. 

The 1408 training data were divided into two subsets of 704 time samples. We use fk{t) in a subset 
to predict fk (t) in the other subset. In order to quantify the stability of the prediction we randomly 
selected 85 % of the training set (first subset), and predicted 85 % of the testing set (other subset). 
The role, training or testing, of each subset of 704 time samples was also chosen randomly. We 
generated 20 experiments for each value of K, the number of predictors. The performance was 
quantified with the normalized correlation between the model prediction and the real value of fk, 

r = {Sfi^\t), SMt))/^{S{r,^^y){Sfi), (5) 

where 5fk = fk{t) — (fk)- Finally, r was averaged over the 20 experiments. Fig. |4]-(a) and (b) show 
the performance of the nonlinear method and linear method as a function of K. The approach based 
on the nonlinear embedding yields very stable results, with low variance. For both global methods 
the optimal performance is reached with less than 50 coordinates. Fig. [5] shows the correlation 
coefficients for 11 features, using K = 33 coordinates. For most features, the nonlinear embedding 
performed better than global PCA. 

3 From global to local 

While models based on global features leverage predictive components from across the brain, cog- 
nitive function is often localized within specific regions. Here we explore whether simple models 
based on classical Brodmann regions provide an effective decoder of natural experience. The Brod- 
mann areas were defined almost a century ago (see e.g |7 1) and divide the cortex into approximately 
50 regions, based on the structure and arrangement of neurons within each region. While the ar- 
eas are characterized structurally many also have distinct functional roles and we use these roles to 
provide useful interpretations of our predictive models. Though the partitioning of cortical regions 
remains an open and challenging problem, the Brodmann areas represent a transparent compromise 
between dimensionality, function and structure. 

Using data supplied by the competition, we warp each brain into standard Talairach coordinates and 
locate the Brodmann area corresponding to each voxel. Within each Brodmann region, differing in 
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size from tens to thousands of elements, we build the covariance matrix of voxel time series using 
all three virtual reality episodes. We then project the voxel time series onto the eigenvectors of the 
covariance matrix (principal components) and build a simple, linear stimulus decoding model using 
the top n modes ranked by their eigenvalues. 



(6) 



where k indexes the different Brodmann areas, {wf} are the linear weights and {m^{t)} are the 
mode time series in each region. The weights are chosen to minimize the RMS error on the training 
set and have a particularly simple form here as the modes are decorrelated, wf = {S{t)m^{t)). 
Performance is measured as the normalized correlation r (Eq. [5]) between the model prediction and 

(a) (b) 
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Figure 4: Performance of the prediction of natural experience for three features, faces, instructions 
and velocity as a function of the model dimension, (a) nonlinear embedding, (b) global principal 
components, (c) local (Brodmann area) principal components. In all cases we find that the predic- 
tion is remarkably low-dimensional, saturating with less than 100 features, (d) stability and inter- 
pretability of the optimal Brodmann areas used for decoding the presence of faces. All three areas 
are functionally associated with visual processing. Brodmann area 22 (Wernicke's area) is the best 
predictor of instructions (not shown). The connections between anatomy, function and prediction 
add an important measure of interpretability to our decoding models. 



the real stimulus averaged over the two virtual reality episodes and we use the region with the lowest 
training error to make the prediction. In principle, we could use a large number of modes to make a 
prediction with n limited only by the number of training samples. In practice the predictive power 
of our linear model saturates for a remarkably low number of modes in each region. In Fig|4jc) we 
demonstrate the performance of the model on the number of local modes for three stimuli that are 
predicted rather well (faces, instructions and velocity). 



6 



For many of the well-predicted stimuli, the best Brodmann areas were also stable across subjects and 
episodes offering important interpretability. For example, in the prediction of instructions (which 
the subjects received through headphones), the top region was Brodmann Area 22, Wernicke's area, 
which has long been associated with the processing of human language. For the prediction of the 
face stimulus the best region was usually visual cortex (Brodmann Areas 17 and 19) and for the 
prediction of velocity it was Brodmann Area 7, known to be important for the coordination of visual 
and motor activity. Using modes derived from Laplacian eigenmaps we were also able to predict an 
emotional state, the self-reporting of fear and anxiety. Interestingly, in this case the best predictions 
came from higher cognitive areas in frontal cortex, Brodmann Area 1 1 . 
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Figure 5: Performance of the prediction of natural experience for eleven features, using three differ- 
ent methods. Local decoders do well on stimuli related to objects while nonlinear global methods 
better capture stimuli related to emotion. 

While the above discussion highlights the usefulness of classical anatomical location, many aspects 
of cognitive experience are not likely to be so simple. Given the reasonable results above it's natural 
to look for ways of combining the intuition derived from single classical location with more global 
methods that are likely to do better in prediction. As a step in this direction, we modify our model 
to include multiple Brodmann areas 

n 

leA 2=1 

where A represents a collection of areas. To make a prediction using the modified model we find the 
top three Brodmann areas as before (ranked by their training correlation with the stimulus) and then 
incorporate all of the modes in these areas (nA in total) in the linear model of Eq [7] The weights 
{wl} are chosen to minimize RMS error on the training data. The combined model leverages both 
the interpretive power of single areas and also some of the interactions between them. The results 
of this combined predictor are shown in Fig. [5] (black) and are generally significantly better than 
the single region predictions. For ease of comparison, we also show the best global results (both 
nonlinear Laplacian and global principal components). For many (but not all) of the stimuli, the 
local, low-dimensional linear model is significantly better than both linear and nonlinear global 
methods. 
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4 Discussion 



Incorporating the knowledge of functional, cortical regions, we used fMRI to build low-dimensional 
models of natural experience that performed surprisingly well at predicting many of the complex 
stimuli in the EBC competition. In addition, the regional basis of our models allows for transparent 
cognitive interpretation, such as the emergence of Wernicke's area for the prediction of auditory 
instructions in the virtual environment. Other well-predicted experiences include the presence of 
body parts and faces, both of which were decoded by areas in visual cortex. In future work, it will 
be interesting to examine whether there is a well-defined cognitive difference between stimuli that 
can be decoded with local brain function and those that appear to require more global techniques. 
We also learned in this work that nonlinear methods for embedding datasets, inspired by manifold 
learning methods |4, 5, 6|, outperform linear techniques in their ability to capture the complex 
dynamics of fMRI. Finally, our particular use of Brodmann areas and linear methods represent only 
a first step towards combining prior knowledge of broad regional brain function with the construction 
of models for the decoding of natural experience. Despite the relative simplicity, an entry based on 
this approach scored within the top 5 of the EBC2007 competition |3J. 
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